-
Notifications
You must be signed in to change notification settings - Fork 31
Add new feature 'full-metadata' to dump all available metadata from pbf file #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…bf file With cfg='full-metadata' enabled, these fields are available in an 'info' field of Node/Way/Relation object: - version - timestamp, as an i64 number, in milliseconds since 1970 epoch - changeset - user id - user name - visible, set to true by default This is greatly inspired from PR #TeXitoi#40, with the addition of more fields.
src/objects.rs
Outdated
| pub decimicro_lon: i32, | ||
| #[cfg(feature = "full-metadata")] | ||
| /// Additional metadata | ||
| pub info: Option<Info>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a field in a struct with all field public break the "adding a feature is not breaking" contract. For example, if someone does
let Node { id, tags, decimicro_lat, decimicro_lon } = node;this code will not compile if the feature is activated.
We can keep the field and only populate it with an parameter, but the Info struct will make the different structs significantly bigger. We can use pub info: Option<Box<Info>>, to limit the augmented size, but it will impact the performance with and without the Info retrieval.
What do you propose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know of the rule that enabling a feature shouldn't break logic, but it makes sense.
There is also the possibility of adding functions like iter_with_metadata, but it might mean to duplicate most of the logic. Or maybe we could play with generics, but might be too complex.
I will do some trials with your proposition of info: Option<Box<Info>>.
In any case, does it mean that there will be a need to bump the major version of the package? My original thought was that adding a new feature would help to not break backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we modify a struct with all pub field, yes, it will need a version increment.
Another possiblitity would be to encapsulate in the other way:
struct WithInfo<T> {
obj: T,
info: Option<Info>,
}And iterate on it, with dedicated methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have launched examples/count.rs on france.osm.pbf, with Option<Box<Info>>, and it is slower than expected. These stats are generated with hyperfine
-
main
Time (mean ± σ): 18.375 s ± 1.249 s [User: 129.637 s, System: 4.216 s]
Range (min … max): 17.712 s … 21.905 s 10 runs -
this version, with full-metadata off
Time (mean ± σ): 18.196 s ± 0.307 s [User: 130.923 s, System: 4.132 s]
Range (min … max): 17.900 s … 18.967 s 10 runs -
this version, with full-metadata on
Time (mean ± σ): 24.424 s ± 0.363 s [User: 181.625 s, System: 5.445 s]
Range (min … max): 24.039 s … 25.108 s 10 runs -
this version, but using
NonZero<>inInfo, with full-metadata on
Time (mean ± σ): 21.686 s ± 0.255 s [User: 152.626 s, System: 4.924 s]
Range (min … max): 21.476 s … 22.326 s 10 runs -
Option<Box<Info>>, with full-metadata off
Time (mean ± σ): 19.162 s ± 0.440 s [User: 132.996 s, System: 4.502 s]
Range (min … max): 18.824 s … 20.170 s 10 runs -
Option<Box<Info>>, with full-metadata on
Time (mean ± σ): 35.523 s ± 0.286 s [User: 160.999 s, System: 5.323 s]
Range (min … max): 35.129 s … 35.998 s 10 runs -
Option<Box<Info>>, usingNonZero<>inInfo, with full-metadata on
Time (mean ± σ): 36.360 s ± 0.513 s [User: 162.979 s, System: 4.954 s]
Range (min … max): 35.727 s … 37.125 s 10 runs
What would you think of making the info field private, with a size depending on the feature full-metadata, and adding a function get_info() to get its value?
New functions are added to get full metadata: iter_with_metadata() and par_iter_with_metadata(), with same behaviour than iter() and par_iter(). They return ObjInfo objects, which is a node/way/relation with an optional Info field.
|
I’ve changed the code to add a I’ve also modified On the new code, performance is very good:
I think it is still worth keeping the optional feature, but we could remove it - I suspect that the additional logic in DenseNodes to update the |
Hi,
This commit adds a new config "full-metadata" to dump all Info fields from the PBF format, if they are available. I’ve kept them as
Option<>to align with how the proto file is defined - it defines most of the fields as optional.New available fields:
This is greatly inspired from PR ##40, with the addition of more fields.
I’ve launched some basic read tests using
par_iter(), with/without the config on France pbf, which is 4.4GB big. To read all nodes/ways/relations, without any additional computation, on a 2*6-cores cpu, it takes:I think it makes sense to keep the configurability, to not impact performance, but I can remove it if you prefer.